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Abstract 

This  paper  studies  the  computational  complexity  of  Bayesian  and  quasi-Bayesian  esti- 
mation in  large  samples  carried  out  using  a  basic  Metropolis  random  walk.  The  framework 
covers  cases  where  the  underlying  likelihood  or  extremum  criterion  function  is  possibly  non- 
concave,  discontinuous,  and  of  increasing  dimension.  Using  a  central  limit  framework  to 
provide  structural  restrictions  for  the  problem,  it  is  shown  that  the  algorithm  is  computa- 
tionally efficient.  Specifically,  it  is  shown  that  the  running  time  of  the  algorithm  in  large 
samples  is  bounded  in  probability  by  a  polynomial  in  the  parameter  dimension  d,  and  in 
particular  is  of  stochastic  order  d"  in  the  leading  cases  after  the  burn-in  period.  The  reason 
is  that,  in  large  samples,  a  central  limit  theorem  implies  that  the  posterior  or  quasi-posterior 
approaches  a  normal  density,  which  restricts  the  deviations  from  continuity  and  concavity 
in  a  specific  manner,  so  that  the  computational  complexity  is  polynomial.  An  appfication 
to  exponential  and  curved  exponential  families  of  increasing  dimension  is  given. 
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1.  Introduction 

Markov  Chain  Monte  Carlo  (MCMC)  algorithms  have  dramatically  increased  the  use 
of  Bayesian  and  quasi-Bayesian  methods  for  practical  estimation  and  inference.'^  Bayesian 
methods  rely  on  a  likelihood  formulation,  while  queisi-Bayesian  methods  replace  likelihood 
by  other  criterion  functions.  This  paper  studies  the  computational  complexity  of  a  basic 
MCMC  algorithm  as  both  the  sample  and  parameter  dimension  grow  to  infinity  at  appro- 
priate rates.  The  paper  shows  how  and  when  the  large  sample  asymptotics  places  sufficient 
restrictions  on  the  likelihood  and  criterion  functions  that  guarantee  the  efficient  -  that  is, 
polj'nomial  time  -  computational  complexity  of  these  algorithms.  These  results  suggest 
that  at  least  in  large  samples,  Bayesian  and  Quasi-Bayesian  estimators  can  be  computa- 
tionally efficient  alternatives  to  maximum  likelihood  and  extremum  estimators,  most  of  all 
in  cases  where  hkelihoods  and  criterion  functions  are  non-concave  and  possibly  non-smooth 
in  parameters  of  interest. 

To  motivate  our  analysis,  consider  the  M-estimation  problem,  which  is  a  common  method 
of  estimating  various  kinds  of  regression  models.  The  idea  behind  this  approach  is  to 
maximize  some  criterion  function: 

n 

QnW  =  -^m(r, -(7,:(X^,e)),      eeecM'',  (1.1) 

)=1 

where  Yi  is  the  response  variable,  X,  is  a  vector  of  regressors,  and  g,;  is  a  regression  function. 

In  many  examples,  the  problem  is  nonlinear  and  non-concave,  implying  that  the  argmax 

estimator  may  be  difficult  or  impossible  to  obtain.  For  instance,  in  risk  management  a  major 

problem  is  that  of  constructing  the  estimates  of  Conditional  Value-at-Risk.  In  particular,  the 

problem  is  to  predict  the  a-quantile  of  a  portfolio's  return  Yi  tomorrow,  given  today's  and 

past  available  information  [Xi,  Xi-i, . . .).  This  problem  fits  in  the  M-estimation  framework 

by  taking  function  ?n(-)  to  be  the  asymmetric  absolute  deviation  function'' 

m{u)  =  {a  —  l{u  <  0))u. 

To  reflect  dependence  on  all  past  data  and  accurately  capture  GARCH-like  dependencies, 
leading  research  in  this  area^  considers  recursive  models  of  the  form  qi  =  f{Xi,  qi^i,qi^2,  ■■■',d) 
for  instance,  f{Xi,  qi-i,qi-2,  •••;  d)  =  X'^j  +  piqi-i  +  p2qi-2-  This  implies  a  highly  non-linear, 


■^See  e.g.  books  of  Casella  and  Robert  [6],  Chib  [9],  Geweke  [15],  Liu  [29]  for  detailed  treatments  of  the 
MCMC  methods  and  their  apphcations  in  various  areas  of  statistics,  econometrics,  and  biometrics. 
''See  Koenker  and  Bassett  (1978). 
^E.g.  Engle  and  Maganelli  (2005).  ,        .         ..      ,    '.     :.■,      ,..•,■        • 


recursive  specification  for  the  regression  function  qi{-;  9),  which  in  turn  implies  that  the  cri- 
terion function  used  in  M-estimation  defined  in  (1.1)  is  generally  non-concave.  Furthermore, 
in  this  example,  the  function  Qn{S)  is  non-smooth.  As  a  consequence  the  argmax  estimator 

e  &  &vgmaxQn{e)  (1.2) 

may  be  very  hard  to  obtain.  Figure  1  in  Section  2  illustrates  other  kinds  of  examples  where 
the  argmax  computation  becomes  intractable. 

As  an  alternative  to  argmax  estimation,  consider  the  Quasi-Bayesian  estimator  obtained 
by  integration  in  place  of  optimization: 


6'exp{Q„( 

(1-3) 


L 


eMQn[0')]de' 

This  estimator  may  be  recognized  as  a  quasi-posterior  mean  of  the  quasi-posterior  density 
TT-niO)  DC  expQ„(6').  (Of  course,  when  Qn  is  a  log-likelihood,  the  term  "quasi"  becomes 
redundant.)  This  estimator  is  not  affected  by  local  discontinuities  and  non-concavities  and 
is  often  much  easier  to  compute  in  practice  than  the  argmax  estimator;  see,  for  example, 
the  discussion  in  Chernozhukov  and  Hong  [8]  and  Liu,  Tian,  and  Wei  [28]. 

This  paper  will  show  that  if  the  sample  size  n  grows  to  infinity  and  the  dimension  of  the 
problem  d  does  not  grow  too  quickly  relative  to  the  sample  size,  the  quasi-posterior 

exp{Q„(g)} 

e-XY>{Qn{0')]de'  ^^'^^ 

/e 

will  be  approximately  normal.  This  result  in  turn  leads  to  the  main  claim:  the  estimator 
(1.3)  can  be  computed  using  Markov  Chain  Monte  Carlo  in  polynomial  time,  provided 
the  starting  point  is  drawn  from  the  approximate  support  of  the  quasi-posterior  (1.4). 
As  is  standard  in  tlie  literature,  we  measure  running  time  in  the  irumber  of  evaluations 
of  the  numerator  of  the  quasi-posterior  function  (1.4)  since  this  accounts  for  most  of  the 
computational  burden. 

In  other  words,  when  the  central  hmit  theorem  (CLT)  for  the  quasi-posterior  holds,  the 
above  estimator  is  computationally  tractable.  The  reason  is  that  the  CLT,  in  addition  to 
implying  the  approximate  normahty  and  attractive  estimation  properties  of  the  estimator 
0,  bounds  non-concavities  and  discontinuities  of  Qn{d)  in  a  specific  manner  that  imphes 
that  the  computational  time  is  polynomial  in  the  parameter  dimension  d.  In  particular, 
the  bound  on  the  running  time  of  the  algorithm  is  Op{d^)  in  the  leading  cases  after  the 


so-called  burn-in  period.  Thus,  our  main  insight  is  to  bring  the  structure  implied  by  the 
CLT  into  the  computational  complexity  analysis  of  the  MCMC  algorithm  for  computation 
of  (1.3)  and  sampling  from  (1.4). 

Our  analysis  of  computational  complexity  builds  on  several  fundamental  papers  studying 
the  computational  complexity  of  Metropolis  procedures,  especially  Applegate  and  Kannan 
[1],  Pi'ieze,  Kannan  and  Poison  [14],  Poison  [34],  Kannan,  Lovasz  and  Simonovits  [24],  Kan- 
nan and  Li  [23],  Lovasz  and  Simonovits  [30],  and  Lovasz  and  Vempala  [31,  32,  33].  Many 
of  our  results  and  proofs  rely  upon  and  extend  the  mathematical  tools  previously  devel- 
oped in  these  works.  We  extend  the  complexity  analysis  of  the  previous  literature,  which 
has  focused  on  the  case  of  an  arbitrary  concave  log-likelihood  function,  to  nonconcave  and 
nonsmooth  cases.  The  motivation  is  that  from  a  statistical  point  of  view,  in  concave  set- 
tings it  is  typically  easier  to  compute  a  maximum  likelihood  or  extremum  estimate  than  a 
Bayesian  or  quasi-Bayesian  estimate,  so  the  latter  do  not  necessarily  have  practical  appeal. 
In  contrast,  when  the  log-likelihood  or  quasi-Hkelihood  is  either  nonsmooth,  nonconcave,  or 
both,  Bayesian  and  quasi-Bayesian  estimates  defined  by  integration  are  relatively  attrac- 
tive computationally,  compared  to  maximum  likelihood  or  extremum  estimators  defined  by 
optimization. 

Our  analysis  also  relies  on  statistical  large  sample  theory.  We  invoke  limit  theorems  for 
posteriors  and  quasi-posteriors  for  large  samples  as  n  — >  oo.  These  theorems  are  necessary  to 
support  our  principal  task  -  the  analysis  of  computational  complexity  under  the  restrictions 
of  the  CLT.  As  a  preliminary  step  of  our  computational  analysis,  we  obtain  a  new  CLT 
for  quasi-posteriors  and  posteriors  which  generalizes  the  CLT  previously  derived  in  the 
literature  for  posteriors  and  quasi-posteriors  for  fixed  dimension.  In  particular,  Laplace  c. 
1809,  Bickel  and  Yahav  [4],  Ibragimov  and  Hasminskii  [19],  and  Bunke  and  Milhaud  (1998) 
provided  CLTs  theorems  for  posteriors.  Chernozhukov  and  Hong  [8]  and  Liu,  Tian,  and 
Wei  [28]  provided  CLTs  for  quasi-posteriors  formed  using  various  non-likehhood  criterion 
functions.  In  contrast  to  these  previous  results,  we  allow  for  increasing  dimensions.  Ghosal 
[17]  also  previously  derived  a  CLT  for  posteriors  with  increasing  dimension,  but  only  for 
concave  exponential  families.  We  go  beyond  such  canonical  setup  and  establish  the  CLT 
for  non-concave  and  discontinuous  cases.  We  also  allow  for  general  criterion  functions  in 
place  of  hkelihood  functions.  The  paper  also  illustrates  the  plausibility  of  the  approach 
using  exponential  and  curved  exponential  families.  The  curved  families  arise  for  example 
when  the  data  must  satisfy  additional  moment  restrictions,  as  e.g.  in  Hansen  and  Singleton 


[18],  Chamberlain  [7],  and  Imbens  [20].    The  curved  famihes  fall  outside  the  log-concave 
framework. 

The  rest  of  the  paper  is  organized  as  follows.  In  Section  2,  we  establish  a  generahzed 
version  of  the  Central  Limit  Theorem  for  Bayesian  and  Quasi-Bayesian  estimators.  This 
result  may  be  seen  as  a  generahzation  of  the  classical  Bernstein- Von-Mises  theorem,  in 
that  it  allows  the  parameter  dimension  to  grow  as  the  sample  size  grows,  i.e.  d  — >  oo 
as  n  — >  DO.  In  Section  2,  we  also  formulate  the  main  problem,  which  is  to  characterize 
the  complexity  of  MCMC  sampling  and  integration  as  a  function  of  the  key  parameters 
that  describe  the  deviations  of  the  quasi-posterior  from  the  normal  density.  Section  3 
explores  the  structure  set  forth  in  Section  2  to  find  bounds  on  conductance  and  mixing  time 
of  the  MCMC  algorithm.  Section  4  derives  bounds  on  the  integration  time  of  the  MCMC 
algorithm.  Section  5  considers  an  application  to  a  broad  class  of  curve.d  exponential  famihes, 
which  are  possibly  non-concave  and  discontinuous,  and  verifies  that  our  results  apply  to 
this  class  of  statistical  models.  We  verify  that  high-level  conditions  of  Section  2  follow  from 
primitive  conditions  for  these  models. 

2.  The  Setup  and  The  Problem 

Our  analysis  is  motivated  by  the  problems  of  estimation  and  inference  in  large  samples. 
We  consider  a  "reduced-form"  setup  formulated  in  terms  of  parameters  that  characterize 
local  deviations  from  the  true  statistical  parameter.  ®  The  local  parameter  A  describes 
contiguous  deviations  fi-om  the  true  parameter  and  we  shift  it  by  a  first  order  approximation 
of  the  extremum  estimator  s.  That  is,  for  9  denoting  a  parameter  vector,  9o  the  true  value, 
and  s  —  ^/n{9  —  6o)  the  normalized  extremum  estimator  (or  a  first  order  approximation  to 
it),  we  have  the  local  parameter  A  defined  as 

A  =  ^i{9  -  9o)  -  s. 

The  parameter  space  for  9  is  0,  and  the  parameter  space  for  A  is  therefore  A  =  \/n(9  — 
■^o)  -  s. 

The  corresponding  localized  likehhood  (or  localized  criterion)  function  is  denoted  by  ^(A). 
For  example,  suppose  Ln{9)  is  the  original  Hkelihood  function  in  the  hkelihood  framework 
or,  more  generally,  Ln{9)  is  exp{nQ„(6)}  where  Qn{9)  is  the  criterion  function  in  extremum 


Examples  in  Section  5  further  illustrate  the  connection  between  the  localized  set-up  and  the  non-localized 
set-ups. 


framework,  then 

e{X)  =  L„(0o  +  (A  +  s)/M/Lnm- 
The  assumptions  below  will  be  stated  directly  in  terms  of  ^(A).   (Section  5  provides  more 
primitive  conditions  within  the  exponential  and  curved  exponential  family  framework.) 

Then,  the  posterior  or  quasi-posterior  density  for  A  takes  the  form  (impUcitly  indexed  by 
the  sample  size  n) 

/(A)  =  ryrhrr  (2.5) 

and  we  impose  conditions  that  force  the  posterior  to  satisfy  a  CLT  in  the  sense  of  approach- 
ing the  normal  density 

(j){X)  = ^ —  exp  I  -^A'JA 

(27r)'i/2det(J-i)^/2  V    2 

More  formally,  the  following  conditions  are  assumed  to  hold  for  ^(A)  as  the  sample  size 
n  — >  00.  These  conditions,  which  in  the  following  we  will  call  the  "CLT  conditions," 
explicitly  allow  for  an  increasing  parameter  dimension  d  (d  — >  oo): 

CI.  The  local  parameter  A  belongs  to  the  local  parameter  space  A  e  A  C  M'^.  The  vector 
s  is  a  zero  mean  vector  with  variance  0,  whose  eigenvalues  are  bounded  above  as 
n  ->  oo,  and  A  =  Jf  U  K'=,  where  is'  is  a  closed  ball  5(0,  ||fs:||)  with  \\K\\  =  CVd 
such  that  fj^  fiX)dX  >  1  -  Op(l)  and  Jj.  4>{X)dX  >  I  -  o{l)J 

C2.  The  lower  semi-continuous  posterior  or  quasi-posterior  function  i{X)  approaches  a 
quadratic  form  in  logs,  uniformly  in  K,  i.e.,  there  exist  positive  approximation  errors 
ei  and  £2  such  that  for  every  X  G  K, 


\n^{X)-l^-h'JX^ 


<  ei  4-  £2  •  A'JA/2,  (2.6) 


where  J  is  a  symmetric  positive  definite  matrix  with  eigenvalues  bounded  away 
from  zero  and  fi'om  above.    Also,  we  denote  the  ellipsoidal  norm  induced  by  J  as 

IK'lli  :=  WJ'^M- 
C3.  The  approximation  errors  ei  and  £2  satisfy  £1  =  Op{l),  and  £2  •  |lA'||j  =  Op(l). 

These  conditions  imply  that 

i{X)=g{X)-m{X) 


''Note  that   \\K\\   :=  sup{|ja||   ;  a  G   K}.    The  constant  C  need  not  grow  due  to  the  phenomenon  of 
concentration  of  measure  under  d  — >  00  asymptotics. 


r?0^. 


-^K 


Figure  1.  This  figure  illustrates  how  ln<'(A)  can  deviate  from  ln(;((A)  in- 
cluding possible  discontinuities  on  In^(A). 


ln5(A)  =  --A'JA, 


over  the  approximate  support  set  K  where 

(2.7) 

-ei  -  e2A'JA/2  <  Inm(A)  <  ei  +  e2A'JA/2.  (2.8) 

Figure  1  illustrates  the  kinds  of  deviations  of  lnl{X)  from  the  quadratic  curve  captured  by 
the  parameters  ei  and  €-2,  and  also  shows  the  types  of  discontinuities  and  non-convexities 
permitted  in  our  framework.  Parameter  ei  controls  the  size  of  local  discontinuities  and 
parameter  es  controls  the  global  tilting  away  from  the  cjuadratic  shape  of  the  normal  log- 
density. 

Theorem  1.  [Generalized  CLT  for  Quasi-Posteriors]  Under  the  conditions  stated  above, 
the  density  of  interest 

£(A) 


approaches  a  normal  density  (p{X)  with  variance  matrix  J  in  the  following  sense: 
I  1/(A)  -  0(A)|dA  =  /  1/(A)  -  cj>{X)\d\  +  Op(l)  =  Op(l). 

JA  JK 

Proof.  See  Appendix  A. 


(2.9) 

(2.10) 
D 


Comment  2.1.  Theorem  1  is  a  simple  preliminary  result.   However,  the  result  is  essen- 
tial for  defining  the  environment  in  which  main  results  of  this  paper  -  the  computational 


complexity  results  -  will  be  developed.  The  theorem  shows  that  in  large  samples,  provided 
some  regularity  conditions  hold,  Bayesian  and  Quasi-Bayesian  inference  has  good  large  sam- 
ple properties.  The  main  part  of  the  paper,  namely  Section  3,  develops  the  computational 
implications  of  the  CLT  conditions.  In  particular,  Section  3  shows  that  polynomial  time 
computing  of  Bayesian  and  Quasi-Bayesian  estimators  by  MCMC  is  in  fact  implied  by  the 
CLT  conditions. 

Comment  2.2.  By  allowing  increasing  dimension  (d  -^  cx))  Theorem  1  extends  the  CLT 
previously  derived  in  the  hterature  for  posteriors  in  the  likelihood  framework  (Bickel  and 
Yahav  [4],  Ibragimov  and  Hasminskii  [19],  Bunker  and  Milhadu  [5],  Ghosal  [17])  and  for 
quasi-posteriors  in  the  general  extremum  framework,  when  the  likehhood  is  replaced  by 
general  criterion  functions  (Chernozhukov  and  Hong  [8],  Liu,  Tian,  and  Wei  [28]).  The 
theorem  is  more  general  than  the  results  in  Ghosal  [17],  who  also  considered  increasing 
dimensions  but  limited  his  analysis  to  the  exponential  likelihood  family  framework.  In 
contrast,  Theorem  1  allows  for  non-exponential  families  and  allows  quasi-posteriors  in  place 
of  posteriors.  Recall  that  quasi-posteriors  result  from  using  quasi-hkehhoods  and  other 
criterion  functions  in  place  of  the  likelihood.  This  expands  substantially  the  scope  of  the 
applications  of  the  result.  Importantly,  Theorem  1  allows  for  non-smoothness  and  even 
discontinuities  in  the  Ukehhood  and  criterion  functions,  which  are  pertinent  in  a  number  of 
apphcations  listed  in  the  introduction. 

The  Problem  of  the  Paper.  Our  problem  is  to  characterize  the  complexity  of  obtaining 
draws  from  /(A)  and  of  Monte  Carlo  integration  J  g{X)f{X)dX,  where  /(A)  is  restricted  to 
the  approximate  support  K.  The  procedure  used  to  obtain  the  basic  draws  as  well  as  to 
carry  out  Monte  Carlo  integration  is  a  Metropolis  (Gaussian)  random  walk,  which  is  a 
standard  MCMC  algorithm  used  in  practice.  The  tasks  are  thus: 

I.      Characterize  the  complexity  of  sampling  from  /(A)  as  a  function  of  [d,  n,  ei,  £2,  K); 

II.  Characterize  the  complexity  of  calculating  J  g{X)f{X)dX  as  a  function  of  {d,  n,  £i,  £2,  K] 

III.  Characterize  the  complexity  of  sampling  from  /(A)  and  performing  integrations 
with  /(A)  in  large  samples  as  d,  ?z  — »  oo  by  invoking  the  bounds  on  (d,  n,  ei,  e2.  A') 
imposed  by  the  CLT; 

IV.  Verify  that  the  CLT  conditions  are  apphcable  in  a  variety  of  statistical  problems. 

This  paper  formulates  and  answers  this  problem.  Thus,  the  paper  brings  the  CLT  re- 
strictions into  the  complexity  analysis  and  develops  complexity  bounds  for  sampling  and 
integrating  from  /(A)  under  these  restrictions.    These  CLT  restrictions,  arising  by  using 


large  sample  theory  and  imposing  certain  regularity  conditions,  limit  the  behavior  of  /(A) 
over  the  approximate  support  set  K  in  a  specific  manner  that  allows  us  to  establish  poly- 
nomial computing  time  for  sampling  and  integration.  Because  the  conditions  for  the  CLT 
do  not  provide  strong  restrictions  on  the  tail  behavior  of  /(A)  outside  K  other  than  Cl, 
our  analysis  of  complexity  is  limited  entirely  to  the  approximate  support  set  K  defined  in 
C1-C3. 

Comment  2.3.  By  solving  the  above  problem,  this  paper  contributes  to  the  recent  liter- 
ature on  the  computational  complexity  of  Metropolis  procedures.  Early  work  was  primary 
concerned  with  the  question  of  approximating  the  volume  of  high  dimensional  convex  set's 
where  uniform  densities  play  a  fundamental  role  (Lovasz  and  Simonovits  [30],  Kannan, 
Lovasz  and  Simonovits  [24,  25]).  Later  the  approach  was  generalized  for  the  cases  where 
the  log-likelihood  is  concave  (Fi'ieze,  Kannan  and  Poison  [14],  Poison  [34],  and  Lovasz  and 
Vempala  [31,  32,  33]).  However,  under  log-concavity  the  maximum  hkelihood  estimators 
are  usually  preferred  over  Bayesian  or  quasi-Bayesian  estimator  from  a  computational  point 
of  view.  In  the  absence  of  concavity,  exactly  the  settings  where  there  is  a  great  practical 
appeal  for  using  Bayesian  and  quasi-Bayesian  estimates,  there  has  been  relatively  less,  if 
any,  analj'sis.  One  important  exception  is  the  paper  of  Applegate  and  Kannan  [1],  which 
covers  nearly-concave  but  smooth  densities  using  a  discrete  Metropolis  algorithm.  ®  In 
contrast  to  Applegate  and  Kannan  [1],  our  approach  allows  for  both  discontinuous  and 
non-concave  densities  that  are  permitted  to  deviate  fi^om  the  normal  density  (not  from  an 
arbitrary  log-concave  density,  like  in  Applegate  and  Kannan  [1])  in  a  specific  manner.  The 
manner  in  which  they  deviate  from  the  normal  is  motivated  by  the  CLT  and  controlled 
by  parameters  ti  and  e2,  which  are  in  turn  restricted  by  the  CLT  conditions.  The  CLT 
restrictions  provide  a  congenial  analytical  framework  that  allows  us  to  treat  a  basic  non- 
discrete  sampling  algorithm  frequently  used  in  practice.  In  fact,  it  is  known  that  the  basic 
Metropolis  walk  analyzed  here  does  not  have  good  complexity  properties  (rapidlj'  mixing) 
for  arbitrary  log-concave  density  functions,  in  opposition  to  other  random  walks  like  hit- 
and-run^.  Nonetheless,  the  CLT  conditions  imply  enough  structure  that  the  random  walk  is 
in  fact  rapidly  mixing.  Moreover,  only  Subsection  3.2  depends  on  the  particular  form  of  the 
walk  while  all  the  remaining  results  are  valid  in  a  considerably  more  general  setting.  This 
suggests  that  the  same  CLT  approach  can  be  used  to  establish  polynomial  bounds  for  more 
sophisticated  schemes.   As  is  standard  in  the  literature,  we  assume  that  the  starting  point 


The  discrete  Metropolis  algorithm  facilitates  the  analysis,  but  they  are  are  not  frequently  used  in  practice. 
See  Lovasz  and  Vempala  [33]  for  a  discussion. 


of  the  algorithm  is  in  the  approximate  support  of  the  posterior.  Indeed,  the  polynomial 
time  bound  that  we  derive  applies  only  in  this  case  because  this  is  the  domain  where  the 
CLT  provides  enough  structure  on  the  problem.  Our  analysis  does  not  apply  outside  this 
domain. 


3.  Sampling  from  /  using  a  Random  Walk 

3.1.  Set-Up  and  Main  Result.  In  this  section  we  bound  the  computational  complexity  of 
obtaining  a  draw  from  a  random  variable  approximately  distributed  according  to  a  density 
function  /  as  defined  in  (2.5).  (Section  4  builds  upon  these  results  to  study  the  associated 
integration  problem.)  By  invoking  Assumption  CI,  we  restrict  our  attention  entirely  to 
the  approximate  support  set  K  and  the  accuracy  of  sampling  will  be  defined  over  this  set. 
Consider  a  measurable  space  {K,A).  Our  task  is  to  draw  a  random  variable  according  to  a 
measurable  density  function  /  restricted  to  K  (this  density  induces  a  probability  distribution 
on  K  denoted  by  Q,  i.e.,  Q{A)  =  J^  f{x)dx/  Jj^  f{x)dx  for  all  A  e  A).  Asymptotically,  it  is 
well-known  that  random  walks  combined  with  a  Metropolis  filter  are  capable  of  performing 
such  task.  In  order  to  prove  our  complexity  bounds,  we  will  concentrate  on  a  commonly 
used  random  walk  induced  by  a  Gaussian  distribution.  Such  random  walk  is  completely 
characterized  by  an  initial  point  uq  and  a  fixed  standard  deviation  cr  >  0,  and  its  one- 
step  move.  The  latter  is  defined  as  the  procedure  of  drawing  a  point  y  according  to  a 
Gaussian  distribution  centered  on  the  current  point  u  with  covariance  matrix  cr"/  and  then, 
with  probability  mm{f{y)/f{u),  1}  =  imn{£{y)/i{u),  1}  move  to  y;  otherwise  stay  at  u  (see 
Casella  and  Robert  [6]  and  Vempala  [39]  for  details).  In  the  complexity  analysis  of  this 
algorithm  we  are  interested  in  bounding  the  number  of  steps  of  the  random  walk  required 
to  draw  a  random  variable  from  /  with  a  given  precision.  Equivalently,  we  are  interested 
in  bounding  the  number  of  evaluations  of  the  local  Ukelihood  function  (  required  for  this 
purpose. 

Next  we  review  definitions  of  important  concepts  relevant  for  our  analysis.  The  defini- 
tions of  these  concepts  follow  Lovasz  and  Simonovits  [30]  and  Vem.pala  [39].  Let  q{x\u) 
denote  the  density  function  associated  with  a  random  variable  N{u,  a"!),  and  luiA)  be  the 
indicator  function  of  the  set  A.  For  each  u  £  K  the  one-step  distribution  P^,  the  probability 
distribution  after  one  step  of  the  random  walk  starting  fi'om  u,  is  defined  as 

Pu{A)=   I       m:iiA^4r\A\q{x\u)dx-V0lu{A)  (.3.11) 


where 

e  =  1  -  f  mm\^,l\  q{x\u)dx  (3.12) 

Jk  [fW     J 

is  the  probabihty  of  staying  at  u  after  one  step  of  the  ball  walk  from  u.  A  step  of  the 
random  walk  is  said  to  be  proper  if  the  next  point  is  different  from  the  current  point  (which 
happens  with  probabihty  1—9). 

The  triple  (K,A,  {Pu  ■  u  G  K}),  along  with  a  starting  distribution  Qo,  defines  a  Markov 
chain  in  K.  We  denote  by  Qt  the  probability  distribution  obtained  after  t  steps  of  the 
random  walk.  A  distribution  Q  is  called  stationary  on  {K,A)  if  for  any  A  G  A, 

Pu{A)dQ[u)  =  Q{A).  (3.13) 

Given  the  random  walk  described  earher,  the  unique  stationary  probabihty  distribution  Q 
is  induced  by  the  function  /,  Q{A)  =  /^  f{x)dx/  Jj^  f{x)dx  for  all  A  £  A,  see  e.g.  Casella 
and  Roberts  [6] .  This  is  the  main  motivation  for  most  of  the  MCMC  studies  found  in  the 
literature  since  it  provides  an  asymptotic  method  to  approximate  the  density  of  interest. 
As  mentioned  before,  our  goal  is  to  properly  quantify  this  convergence  and  for  that  we  need 
to  review  additional  concepts. 

The  ergodic  flow  of  a  set  A  with  respect  to  a  distribution  Q  is  defined  as 


^{A)  =  /  Pu{K\A)dQ{u) 

J  A 


It  measures  the  probabihty  of  the  event  {u  G  A,u'  ^  A)  where  u  is  distributed  according 
to  Q  and  u'  is  obtained  after  one  step  of  the  random  walk  starting  from  u\  it  captures  the 
average  flow  of  points  leaving  A  in  one  step  of  the  random  walk.  It  follows  that  Q  is  a 
stationary  measure  if  and  only  if  ^{A)  =  $(A'\A)  for  aU  A  e  A  since 


Pu[K  \  A)dQ{u)  =  /  (1  -  Pu{A))  dQ{u)  =  Q{A)  -  f  Pu{A)dQ[i 

J  A  J  A 

Pu{A)dQ{u)  -  f  Pu{A)dQ{u)  =  ^[K  \  A). 


$(A) 


A  Markov  chain  is  said  to  be  ergodic  if  ^[A)  >  0  for  every  A  with  0  <  Q{A)  <  1,  which 
is  the  case  for  the  A-Iarkov  chain  induced  by  the  random  walk  described  earlier  due  to  the 
assumptions  on  /. 


In  order  to  compare  two  probability  distributions  P  and  Q  we  use  the  total  variation 
distance^° 

IIP  -  QWtv  =  sup  \P{A)  -  Q{A)\.  (3.14) 

ACK 

Moreover,  P  is  said  to  be  a  M-warm  start  with  respect  to  Q  if 

sup        ^<M.  (3.15) 

The  key  concepts  in  the  analysis  are  the  conductance  of  a  set  A,  which  is  defined  as 

^^    '       mm{Q{A),Q{K\A)y 
and  the  global  conductance,  defined  as 

,          .     .,,,               .          HA)               .          UPuiK\A)dQ{u) 
(p  =  mm  (p{A)  =        mm  =        mm        -^ -— — . 

A  0<QiA)<l/2Q{A)        0<QiA)<l/2  Q{A) 

Lovasz  and  Simonovits  [30]  proved  the  connection  between  conductance  and  convergence  for 
the  continuous  space  setting.  This  result  extended  an  earlier  result  of  Jerome  and  Sinclair 
[21,  22],  who  connected  convergence  and  conductance  for  discrete  state  spaces.  Lovasz  and 
Simonovits'  result  can  be  re-stated  as  follows. 

Theorem  2.  Let  Qq  be  a  M-warm  start  with  respect  to  the  stationary  distribution  Q.   Then, 

WQt  ~  QWtv  <  ^1  U  - 

Proof.  See  Lovasz  and  Simonovits  [30].  ,  D 

The  main  result  of  this  paper  provides  a  lower  bound  for  the  global  conductance  of  the 
Markov  chain  4>  under  the  CLT  conditions.  In  particular,  we  show  that  1/0  is  bounded  by 
a  fixed  polynomial  in  the  dimension  of  the  parameter  space. 

Theorem  3.  (Main  Result)  Under  Assumptions  Cl,  C2,  C3,  and  setting  the  parameter 
a  for  the  random  walk  as  defined  in  (3.17),  the  global  conductance  of  the  induced  Markov 
chain  satisfies 

l/0  =  o(de4^>+^^^ll^'l!'). 
In  particular,  the  random,  walk  requires  at  most  ■. 

N,  =  Op[  e«^i+8''^ll^'ii5    (-Y  ln(M/e: 


^'^This  distance  is  equivalent  to  the  O  (K)  distance  between  the  density  functions  associated  with  P  and 
Q  since  sup^gK  \PiA)  -  Q[A)\  =  \  J^  \dP  -  dQ\. 


steps  to  achieve  WQ^e  ~  QWtv  ^  £•  Invoking  the  CLT  restrictions,  ei  =  o(l),  £2  •  ||-^'^'||j  = 
0(1),  vue  have  that  1/0  =  Op{d)  and  the  number  of  steps  N^  is  bounded  by 

Op  {d^  ln(M/£))  . 

Proof.  See  Section  3.2.  D 

Comment  3.1.  In  general,  the  dependence  on  ei  and  eo  is  exponential  and  this  bound  does 
not  imply  polynomial  time  ("efficient")  computing.  However,  the  CLT  fi-amework  implies 
that  ei  =  0(1)  and  £2  ■  \\I'^\\j  =  o(l),  which  by  Theorem  3  in  turn  imphes  polynomial  time 
computing. 

Next  we  discuss  and  bound  the  dependence  on  M,  the  "distance"  of  the  initial  distribution 
Qo  from  the  stationary  distribution  Q  as  defined  in  (3.15).  A  natural  candidate  for  a  starting 
distribution  Qq  is  the  one-step  distribution  conditional  on  a  proper  move  from  an  arbitrary 
point  u  e  K.  We  emphasize  that,  in  general,  such  choice  of  Qq  could  lead  to  values  of 
M  that  are  arbitrary  large.  In  fact,  this  could  happen  even  in  the  case  of  the  stationary 
density  being  a  uniform  distribution  on  a  convex  set  (see  [33]).  Fortunately,  this  is  not  the 
case  under  the  CLT  framework  as  shown  by  the  following  lemma. 

Lemma  1.  Let  u  G  K  and  P^  be  the  associated  one-step  distribution.  With  probability  at 
least  1/3  the  random  walk  makes  a  proper  move.  Conditioned  on  peiforming  a  proper  m,ove, 
the  one-step  distribution  is  a  M-warm  start  start  for  f,  where 

InM  =  0(dln(\/d||/^||)  +  |jA'||5  +  ei  +  e2\\K\\-j). 

Under  the  CLT  restrictions,  e2\\K\\j  =  o(l)  and  \\K\\j  =  0[\/d),  so  that 

\nM  =  0{d\nd). 

Proof.  See  Appendix  A.  '  ■  .  '  D 

Combining  this  result  with  Theorem  3  yields  the  overall  (burn-in  plus  post  burn-in) 
running  time 

Op{d^\iid). 


3.2.  Proof  of  the  Main  Result.  The  proof  of  Theorem  3  uses  two  auxihary  results:  an 
iso-perimetric^'  inequality  and  a  geometric  property  of  the  particular  random  walk.  The 
first  is  an  analytical  result  and  is  of  independent  mathematical  interest.  After  the  connection 
between  the  iso-perimetric  inequality  and  the  ergodic  flow  is  established,  the  second  result 
allows  us  to  use  the  first  result  to  bound  the  conductance  from  below.  In  what  follows  we 
provide  an  outline  of  the  proof,  auxiliary  results,  and,  finally,  the  formal  proof 

3.2.1.  Outline  of  the  Proof.  The  proof  follows  the  arguments  in  Lovasz  and  Vempala  [31]. 
In  order  to  bound  the  ergodic  flow  of  A  £  A,  consider  the  particular  disjoint  partition 
K  =  5i  U  5^2  U  53  where  Si  C  A,  5^  C  K  \  A,  and  S3  consists  of  points  in  A  or  K  \A  for 
which  the  one-step  probability  of  going  to  the  other  set  is  at  least  a  fixed  constant  c  (to  be 
defined  later).  Therefore  we  have 

HA)    -  X4  Pu(K  \  A)dQ{u)  =  i  /^  P„(/i  \  A)dQ{u)  +  1  /^.^^  Pu{A)dQ{u) 
>  i  4  P„(A- \  A)dQ(«)  +  i  4  P„(.4)dQ(a)  +  f  Q(53). 

where  the  second  equality  holds  because  $(A)  =  ^{K  \A). 

Since  the  first  two  terms  could  be  arbitrarily  small,  the  result  will  follow  by  bounding 
the  last  term  from  below.  This  will  be  achieved  by  an  iso-perimetric  inequality  which  is 
tailored  to  the  CLT  framework  and  is  derived  in  Section  3.2.2.  This  result  will  provide  a 
lower  bound  on  QiS^),  which  is  decreasing  in  the  distance  between  S\  and  52.  Therefore 
one  still  needs  to  bound  the  distance  between  these  sets. 

Given  two  points  u  G  Si  and  v  £  52,  we  have  Pu{K\A)  <  c  and  Pv{A)  <  c.  Therefore,  the 
total  variation  distance  between  their  one-step  distributions  \\Pu  -  Py\\  >  \Pu{A)  -  Pv{A)\  > 
1  —  2c.  The  geometric  properties  of  the  random  walk  are  used  to  ensure  that  this  condition 
imphes  that  \\u  —  v\\  is  also  bounded  from  below  (see  Section  3.2.3).  Since  u  and  v  were 
arbitrary  points,  the  sets  Si  and  52  are  "far"  apart.  Therefore  53  cannot  be  arbitrarily 
small,  i.e.,  Q(53)  is  bounded  from  below. 

This  leads  to  a  lower  bound  for  the  global  conductance.  After  bounding  the  global 
conductance  from  below.  Theorem  3  follows  by  invoking  CLT  conditions  and  Theorem  2. 


The  iso-perimetric  problem  has  a  long  history  in  mathematics.  It  consists  of  finding  the  set  that 
minimizes  a  certain  criterion  within  a  particular  family  of  sets.  A  famous  and  familiar  example  of  an  iso- 
perimetric  problem  is  the  following:  among  all  sets  with  unit  volume,  find  the  one  that  has  the  smallest 
surface  area. 


3.2.2.  An  Iso-perimetric  Inequality.  We  start  by  defining  a  notion  of  approximate  log- 
concavity.  A  function  /  :  R"  — >  IR  is  said  to  be  log-/?-concave  if  for  every  a  €  [0,1], 
x,y  E  IR",  we  liave 

/(a.T  +  (l-a)y)>/?/(xr/(j/)i-" 

for  some  /3  £  (0, 1].  /  is  said  to  be  logconcave  if  P  can  be  talcen  equal  to  one.  The  class 
of  log-/?-concave  functions  is  rather  broad,  for  example,  including  various  non-smooth  and 
discontinuous  functions. 

Together,  the  relations  (2.7)  and  (2.8)  imply  that  we  can  write  the  functions  /  and  i  as 
the  product  of  e~^^  '^^  and  a  log-/?-concave  function: 

Lemma  2.  Over  the  set  K  the  functions  /(A)  :=  i(X)/  J,^({\)d\  and  ({X)  are  the  product 
of  a  Gaussian  function,  e~'^        ,  and  a  p -log- concave  function  whose  parameter  (3  satisfies 

ln/3>2-(-ei-e2-||A'||}). 

Proof.  It  foUows  from  (2.8).  D 

In  our  case,  the  larger  is  the  support  set  K,  the  larger  is  the  deviation  from  log-concavity. 
That  is  appropriate  since  the  CLT  does  not  impose  strong  restrictions  on  the  tail  of  the  prob- 
ability densities.  Nonetheless,  this  gives  a  convenient  structure  to  prove  an  iso-perimetric 
inequality  which  covers  even  non-continuous  cases  permitted  in  the  framework  described  in 
the  previous  sections. 

Lemma  3.  Consider  any  measurable  partition  of  the  form  K  =  5'i  U  52  U  Ss  such  that  the 
distance  between  S\  and  5*2  is  at  least  t,  i.e.  d{Si,S2)  >  t.  Let  Q{S)  =  J^  fdx/  Jj^  fdx. 
Then  for  any  lower  semi-continuous  function  f{x)  =  e~ll^ll  m{x),  where  m  is  a  log-P- 
concave  function,  we  have 

Q{S2)  >  /3^^^^  min  {Q(5i),  QiS.)}  ■ 

Proof  See  Appendix  A.  D 

Comment  3.2.  This  new  iso-perimetric  inequality  extends  the  iso-perimetric  inequality  in 
Kannan  and  Li  [23],  Theorem  2.L  The  proof  builds  on  their  proof  as  well  as  on  the  ideas 
in  Applegate  and  Kannan  [1].  Unhke  the  inequality  in  [23],  Lemma  3  removes  smoothness 
assumptions  on  /,  for  example,  covering  both  non-log-concave  and  discontinuous  cases. 


The  iso-periraetric  inequality  of  Lemma  3  states  that,  under  suitable  conditions,  if  two 
subsets  of  K  are  far  apart,  the  measure  of  the  remaining  subset  should  be  comparable  to 
the  measure  of  at  least  one  of  the  original  subsets.  The  following  corollary  extends  the 
previous  theorem  to  cover  cases  with  an  arbitrary  covariance  matrix  J . 

Corollary  1.  Consider  any  measurable  partition  of  the  form  K  =  Si  U  SsU  S2  such  that 
d{Si,  S2)  >  t,  and  let  Q{S)  =  J^  fdx/  /^  fdx.  Then  for  any  lower  semi- continuous  function 
f{x)  =  e~2^  '^^m{x),  where  m  is  a  log-P-concave  function,  we  have 


QiSs)  >  P  ie-^-"''/8  y'^^  j^in  {q(^Si),  Q{S2)}  , 

where  Xmin  denotes  the  minimum,  eigenvalue  of  the  positive  definite  matrix  J. 

Proof.  See  Appendix  A.  '  D 

3.2.3.  Bounds  on  the  Difference  of  One-step  Distributions.  Next  we  relate  the  total  varia- 
tion distance  between  two  one-step  distributions  with  the  Euclidean  distance  between  the 
points  that  induce  them.  Although  this  approach  follows  the  one  in  Lova^z  and  Vempala 
[31,  32,  33]  there  are  two  important  differences  which  call  for  a  new  proof.  First,  we  no 
longer  rely  on  log-concavity  of  /.  Second,  we  use  a  different  random  walk.  We  start  with 
the  following  auxiliary  result. 

Lemma  4.  Let  g  :  IR"  — >  IR  6e  a  function  such  that  Ing  is  Lipschitz  luith  constant  L  over 
compact  set  K .   Then,  for  every  x  e  K  and  r  >  0, 

,      :,,  inf        [g{y)/g[x)\>e-'^\ 

yeB{x,r)nK 

Proof  The  result  is  obvious.  D 

Given  a  compact  set  K,  we  can  bound  the  Lipschitz  constant  of  the  concave  function  Ing 
defined  in  (2.7)  by 

L  <  sup  ]|Vln5(A)||  <  sup  ||JA||  <  Xmax\\K\\  =  0  (Vd)  .  (3.16) 

Lemma  5.  Let  u,v  e  K  :=  B{0,  ||A'|1),  cr^  <  jg^j;^,  and  suppose  that  jr^  <  j^  and  \\u  — 
v\\  <  ^  where  L  is  the  Lipschitz  constant  of  Ing  on  the  set  K.  Under  our  assumptions  on 
f  as  defined  in  (2.5),  we  have 

||Pu-p.||tv<i-^. 

Proof.  See  Appendix  A.  '     D 


The  converse  of  Lemma  5  states  that  if  two  points  induce  one-step  probability  distribu- 
tions that  are  far  apart  in  the  total  variation  norm,  these  points  must  also  be  far  apart  in 
the  Euclidean  norm.  This  geometric  result  provides  a  key  ingredient  in  the  application  of 
the  iso-perimetric  inequality  as  discussed  earher. 

3.2.4.  Proof  of  Theorem  3.  Consider  the  compact  support  for  f  as  K  —  B{0,  \\K\\), 
where  ||ii'||  =  0{\/d/Xmin)  from  Assumption  CI.  We  define 

a  =  min  {l/4VdL,  \\K\\/V2{)d\  .  (3.17) 

Therefore  the  assumptions  of  Lemma  5  are  satisfied.  Moreover,  under  the  assumptions  of 
the  theorem,  using  (3.16)  it  follows  that 

cj>  ■= .  ,  (3.18) 

l20\raaxVd\\K\\ 

Fix  an  arbitrary  set  A  e  ^  and  denote  hy  A'^  =  K\A  the  complement  of  A  with  respect 
to  K.  We  will  prove  that 

HA)  =  j  Pu{A')dQiu)  >  ^a^%~mm{Q{A),QiA')},  (3.19) 

which  implies  the  desired  bound  on  the  global  conductance  (j).  Note  that  this  is  equivalent 
to  bounding  #(A'^)  since  Q  is  stationary  on  {K,A). 
Consider  the  following  auxiliary  definitions: 

Si^LeA:  Pu{A')  <-^Y    S2  =  lveA':  P„(A)  <  ^  |  ,  and  S^  =  K\{Si  U  S2). 

First  assume  that  Q{Si)  <  Q{A)/2  (a  similar  argument  can  be  made  for  52  and  A'').   In 
this  case,  we  have 

$(A)  =  /  P,{A')dQ{u)  >  f       P^[A^)dQ{u)  >   f       fdQiu)  >  ^Q{A\S,)  >  ^Q{A), 
J  A  Ja\Si  JA\Si  6e  6e  12e 

and  the  inequahty  (3.19)  foUows. 

Next  assume  that  QiSi)  >  Q{A)/2  and  Q{S2)  >  QiA'')/2.  Since  $(A)  =  ^{A')  we  have 
that 

^A)  =  j  P„{A')dQ{u)    =    lJ^Pu{A')dQ{u)  +  lJ^,P,,{A)dQiv) 


A\Si  ^  "^^  ;^^\^J  -r  2  Ja<=\S2  ■ 
2  J53  G^dQiu)  —  j^( 


>    Us.ldQ{u)^4-Q{S^), 


where  we  used  that  ^3  =  K  \  {S\  U  52)  =  (^  \  ^i)  U  [A"  \  52).  Given  the  definitions  of  the 
sets  Si  and  5^2,  for  every  u  £  S\  and  t;  e  52  we  have 

\\Pu  -  PvWtv  >  Pu{A)  -  Py{A)  =  1  -  Pu{A')  -  Pv{A)  >  1  -  ^- 

In  such  case,  by  Lemma  5,  we  have  that  ||u  —  i'||  >  |  for  every  u  €  Si  and  v  G  52.  Thus, 
we  can  apply  the  iso-perimetric  inequahty  of  CoroUary  1,  with  d{Si,S2)  >  cr/S,  to  bound 
(5(53).  We  obtain 

J  Pu{A^)dQ{u)     >     g|e-i^-"-/64  y2^min{Q(5i),Q(52)} 
>     ^%^min{Q(A),Q(A^)}. 

where  tire  second  inequahty  also  used  that  Xmin'^'^  <  A^j„  ,|'  ^y^  <  '^/d  under  our  defini- 
tions. Therefore,  using  relation  (3.18)  and  \\K\\  =  0{\Jd/\min)^  we  obtain 

i/4>  =  o  f/j-^^d)  =o(d  e'^^+'^-mA 

since  the  eigenvalues  are  assumed  to  be  uniforml}^  bounded  from  above  and  away  from  zero. 

The  remaining  results  in  Theorem  3  follow  by  invoking  the  CLT  conditions  and  applying 

Theorem  2  with  the  above  bound  on  the  conductance.  D 


4.  Complexity  of  Monte  Carlo  Integration 

This  section  considers  our  second  problem  of  interest  -  that  of  computing  a  high  dimen- 
sional integral  of  a  bounded  real  valued  function  g: 


ixg  =   /    g{\)!{\)d\.  (4.20) 

■  Jk 

The  integral  is  computed  by  simulating  a  dependent  (Markovian)  sequence  of  random  points 
A"*,  A^,  . . .,  which  has  /  as  the  stationary  distribution,  and  taking 

1     ^ 
/^3  =  ]^Eff('\')  (4.21) 

1=1 

as  an  approximation  to  (4.20).  The  dependent  nature  of  the  sample  increases  the  sample 
size  needed  to  achieve  a  desired  precision  compared  to  the  (infeasible)  case  of  independent 
draws  from  /.  It  turns  out  that  as  in  the  preceding  analysis,  the  global  conductance  of  the 
the  Markov  chain  sample  will  be  crucial  in  determining  the  appropriate  sample  size. 


The  starting  point  of  our  analysis  is  a  central  limit  theorem  for  reversible  Markov  chains 
due  to  Kipnis  and  Varadhan  [26]  which  is  restated  here  for  convenience.  Consider  a  re- 
versible Markov  chain  on  K  with  stationary  distribution  /.  The  lag  k  autocovariance  of  the 
stationary  time  series  {s'(A')}^^j,  obtained  by  starting  the  Markov  chain  with  the  stationary 
distribution  /  is  defined  as 


7fc  =  Cov/(g(A^),5(V+'^-)). 


Let  us  recall  a  characterization  of  7^  via  spectral  theory  following  Kipnis  and  Varadhan 
[26] :  Let  T  denote  the  transition  operator  of  the  Markov  chain  induced  by  the  random  walk. 
In  our  case,  since  the  chain  is  reversible,  T  is  a  linear  bounded  self-adjoint  operator  in  the 
Hilbert  space  Lr{K,A,Q),  see  [30].  Let  Eg  denote  the  measure  on  Borel  sets  of  (—1,1) 
induced  by  the  spectral  measure  of  T  applied  to  g,  as  in  [16].  With  these  definitions,  one 
has  that  for  any  k 

7fc=     f  J'^kEa 


We  are  prepared  to  restate  the  central  hmit  theorem  of  Kipnis  and  Varadhan  [26]  needed 
for  our  analysis. 

Theorem  4.  For  a  stationary,  irreducible,  reversible  Markov  chain,  with  'p.g  and  fig  defined 
as  (4.21)  and  (4.20), 

+00 


k  —  —  oo 

almost  surely.  If  cr^  is  finite,  then  \fN{flg  —fig)  converges  in  distribution  to  N{Q,a'l). 

Proof.  See  Kipnis  and  Varadhan  [26].  D 

In  our  case,  70  is  finite  since  g  is  bounded.  The  next  result,  which  builds  upon  Theorem 
2,  states  that  cr^  can  be  bounded  using  the  global  conductance  of  the  Markov  chain. 

Corollary  2.  Let  g  be  a  square  integrable  function  with  respect  to  the  stationary  measure 
Q.    Under  the  assumptions  of  Theorem.  4,  we  have  that 

7fc  <  I  1  -  y  I      70    and   o-g  <  70  I  -2 
Proof.  See  Lovaasz  and  Simonovits  [30]. 


We  will  use  the  mean  square  error  as  the  measure  of  closeness  for  a  consistent  estimator: 

MSEiflg)     =    E\fig    -    Mg]2. 

Many  approaches  are  possible  for  constructing  the  sequence  of  draws  in  (4.21);  we  refer 
to  [16]  for  a  detailed  discussion.  Here,  we  will  analyze  three  common  schemes: 

•  long  run  (Ir), 

•  subsample  (ss), 

•  multi-start  (ms). 

Denote  the  sample  sizes  corresponding  to  each  method  as  A';,.,  A^ss,  and  Nms-  The  long 
run  scheme  consists  of  generating  the  first  point  using  the  starting  distribution  and,  after 
the  burn-in  period,  selecting  the  Ni^  subsequent  points  to  compute  the  sample  average 
(4.21).  The  subsample  method  also  uses  only  one  sample  path,  but  the  Nss  draws  used 
in  the  sample  average  (4.21)  are  spaced  out  by  S  steps  of  the  chain.  Finally,  the  multi- 
start  scheme  uses  Nms  different  sample  paths,  initializing  each  one  independently  from  the 
starting  distribution  /o  and  picking  the  last  draw  in  each  sample  path  after  the  burn-in 
period  to  be  used  in  (4.21). 

There  is  another  issue  that  must  be  addressed.  All  schemes  require  that  the  initial 
points  are  drawn  from  the  stationary  distribution  /.  We  therefore  need  to  compute  the  so 
called  "burn-in"  period  B,  that  is,  the  number  of  iterations  required  to  reach  the  stationary 
distribution  /  with  a  desired  precision,  beginning  at  the  starting  distribution  /q. 

Theorem  5.  Let  /o  be  a  M-warm  start  with  respect  to  f,  and  g  :=  sup;^^;^;  |5('^)|-  Using 
the  notation  introduced  in  this  section,  to  obtain  MSE{fig)  <  e  it  is  sufficient  to  use  the 
following  lengths  of  the  burn-in  sample.  B,  and  post-burn  in  samples,  Nij.,Nss,Mms- 

Tif 


In 

and 

^^r  =  -:|,     A^..  =  —  {unth  S  =  {2/<p^)  In  (670/e)),     A^^.  =  |^. 
The  overall  complexities  of  Ir,  ss,  and  ms  methods  are  thus  B  +  Ni,.,  B  +  SN^s,  and  B  x  Nms- 

Proof.  See  Appendix  A.  .  ■  .  D 

For  convenience  Table  1  tabulates  the  bounds  for  the  three  different  schemes.  Note 
that  the  dependence  on  M  and  g  is  only  via  log  terms.  Although  the  optimal  choice  of 
the  method  depends  on  the  particular  values  of  the  constants,  when  £  \  0,  the  long-run 


Table  1.  Burn-in  and  Post  Burn-in  Complexities 

Method  Quantities       Complexity 

Long  Run      B  +  iV,,          ^  An  ^^vfal^U  ^  (lio) 
Subsample     B  +  N.^  ■  S     ;^  (in  f  ^^^  j) -f  ^  (^  In  (^)) 
Multi-start    B  x  N^^         ^  In  ((^^^)  )  x  ^ 

Table  2.  Burn-in  and  Post  Burn-in  Complexities  under  the  CLT. 

Metliod  Burn-in  Complexity     Post-burn-in  Complexity 

Long  Run        Op(dHnd  ■  IriE-'^)      +Op{d'^-£-^) 
Subsample       Op{dHnd  ■Ine''^)       +  Op{d'^  ■  e''^  ■  Ine-'^) 
Multi-start      Op((i^  Inrf  ■  Ine"!)       x  Op(£"^) 


algorithm  has  the  smallest  (best)  bound,  while  the  the  multi-start  algorithm  has  the  largest 
(worst)  bound  on  the  number  of  iterations.  Table  2  presents  complexities  impUed  by  the 
CLT  conditions,  namely  |jA'||  =  0{Vd),  ei  -^  0,  and  £2||-ft^P  -^  0.  The  table  assumes  70 
and  g  are  constant,  though  it  is  straightforward  to  tabulate  the  results  for  the  case  where  70 
and  g  grow  at  polynomial  speed  with  d.  Finally,  note  that  the  bounds  apply  under  a  slightly 
weaker  condition  than  the  CLT  requires,  namely  that  ei  =  Op(l)  and  e2||A'|P  =  Op{l). 

5.  Application  to  Exponential  and  Curved-Exponential  Families 

In  this  section  we  verify  that  our  conditions  and  analysis  apply  to  a  variety  of  statistical 
problems.  We  begin  the  discussion  with  the  canonical  log-concave  cases  within  the  expo- 
nential family.  Then  we  drop  the  concavity  and  smoothness  assumptions  to  illustrate  the 
full  applicability  of  the  approach  developed  in  this  paper. 

5.1.  Concave  Cases.  Exponential  families  play  a  very  important  role  in  statistical  esti- 
mation, cf.  Lehmann  and  Casella  [27],  especially  in  high-dimensional  contexts,  cf.  Portnoy 
[35],  Ghosal  [17],  and  Stone  et  al.  [37].  For  example,  the  high-dimensional  situations  arise 


in  modern  data  sets  in  technometric  and  econometric  applications.  Moreover,  exponen- 
tial familes  have  excellent  approximation  properties  and  are  useful  for  approximation  of 
densities  that  are  not  necessarily  of  the  exponential  form,  cf.  Stone  et  al.  [37]. 

Our  discussion  is  based  on  the  asymptotic  analysis  of  Ghosal  [17].   In  order  to  simplify 
exposition,  we  invoke  the  more  canonical  assumptions  similar  to  those  given  in  Portnoy  [35]. 

El.  Let  xi, . . .  ,Xn  be  iid  observations  from  a  d-dimensional  canonical  exponential 
family  with  density 

f{x;e)=exp{x'9-i>n{0)), 

where  6'  E  0  is  an  open  subset  of  IR  ,  and  d  —>  oo  as  n  — >  oo.  Fix  a  sequence 
of  parameter  points  9q  £  Q.  Set  fx  =  'fp'i&o)  and  F  =  ip"{Oo),  the  mean  and 
covariance  of  the  observations,  respectively.  Following  Portnoy  [35],  we  implicitly 
re-parameterize  the  problem,  so  that  the  Fisher  information  matrix  F  =  I. 

For  a  given  prior  n  on  0,  the  posterior  density  of  0  over  0  conditioned  on  the  data  takes 
the  form 

n 

TTniO)  oc  tt{9)  ■  [|/(.x,;6I)  =  n{e)  ■  exp  {nxO  -  nij{9))  . 

2  =  1 

The  local  parameter  space  is  ^/ri{Q  —  9o).  It  will  be  convenient  to  associate  every  point  9 
in  the  parameter  space  0  with  an  element  of  A,  a  translation  of  the  local  parameter  space, 

A  =  ^i{9  -  9o}  -  s, 

where  s  =  y/n{x  —  /i)  is  a  first  order  approximation  to  the  normalized  meLximum  likeli- 
hood/extremum  estimate.  By  design,  we  have  that  E[s\  —  0  and  E  [ss']  =  Id-  Moreover, 
by  Chebyshev's  inequahty,  the  norm  of  s  can  be  bounded  in  probability,  ||s|]  =  Op{'\/d). 
Finally,  the  posterior  density  of  A  over  A  =  x/n(0  —  9q)  —  s  is  given  by  /(A)  =  r  }(x)d\' 
where,  for  x  =  X]"=i  Xi/n, 

^(A)  =  exp  (x■'(^/^(A  +  s))  +  n{4>{9o  +  (A  +  s)/^i)  -  ^(^o)))  ■  ^(^o  +  (A  -I-  s)/0I). 

We  impose  the  following  regularity  conditions,  following  Ghosal  [17]  and  Portnoy  [35]: 

E2.  Consider  the  following  quantities  associated  with  higher  moments  in  a  neigh- 
borhood of  the  true  parameter  9o: 

Bin(c)  :=  sup{Ee|o'(.'c,  -  ii)\^  :  a  G  IR^  ||a||  =  1,  ||^  -  ^ojl^  <  cd/n}, 
e.a 

B2nic):=sup{Ee\a'{x,~^i)\'^:aeU'^,\\a\\  =  l,\\9-9of<cd/n}. 
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For  any  c  >  0  and  all  n  there  are  p  >  0  and  cq  >  0  such  that 

Bin{c)  <co  +  d'  and  S2n(c)  <  cq  +  d". 

E3.  The  prior  density  n  is  proper  and  satisfies  a  positivity  requirement  at  the  true 

parameter 

supln[7r(6i)/7r(6'o)]  =  0{d) 
eee 
where  9o  is  the  true  parameter.    Moreover,  the  prior  n  also  satisfies  the  following 

local  Lipschitz  condition 

\\mr{e)  -lm:{9o)\  <V{c)^\\e  ~  eoW 

for  all  6  such  that  ||^  —  9q\\"  <  cd/n,  and  some  V{c)  such  that  V{c)  <  cq  +  c^,  the 
latter  holding  for  all  c  >  0. 

E4  The  following  condition  on  the  growth  rate  of  the  dimension  of  the  parameter 
space  is  assumed  to  hold: 

d^/n  -^  0. 

Comment  5.1.  Condition  E2  strengthens  an  analogous  assumption  of  Ghosal  [17].  Both 
assumption  are  imphed  by  the  analogous  assumption  made  by  Portnoy  [35].  Condition 
E3  is  similar  to  the  assumption  on  the  prior  in  Ghosal  [17].  For  further  discussion  of  this 
assumption,  see  [3].  Condition  E4  states  that  the  parameter  dimension  should  not  grow  too 
quickly  relative  to  the  sample  size. 

Theorem  6.   Conditions  EI-E4  imply  conditions  C1-C3. 

Proof.  See  Appendix  A.  D 

Combining  Theorems  1  and  6,  we  have  the  asymptotic  normality  of  the  posterior, 


\f{X)-4>{X)\d\=   /    |/(A)-0(A)|<iA  +  Op(l)=Op(l). 

A  Jk 

Furthermore,  we  can  apply  Theorem  3  to  the  posterior  density  /  to  bound  the  convergence 
time  (number  of  steps)  of  the  Metropohs  walk  needed  to  obtain  a  draw  from  /  (with  a  fixed 
level  of  accuracy);  The  convergence  time  is  at  most 

Opid') 

after  the  burn-in  period;  together  with  the  burn-in,  the  convergence  time  is 

Opid^nd). 


Finally,  the  integration  bounds  stated  in  the  previous  section  also  apply  to  the  posterior  /. 

5.2.  Non-Concave  and  Discontinuous  Cases.  Next  we  consider  the  case  of  a  d-dimen- 
sional  curved  exponential  family.  Being  a  generaHzation  of  the  canonical  exponential  family, 
its  analysis  has  many  similarities  with  the  previous  example.  Nonetheless,  it  is  general 
enough  to  allow  for  non-concavities  and  even  various  kinds  of  non-smoothness  in  the  log- 
likelihood  function. 

NEl.  Let  xi, . . .  ,Xn  be  iid  observations  fi'om  a  d-dimensional  curved  exponential 
family  with  density 

/(.T;e)=exp(x'0(77)-V„(0(r?))),  ,     . 

where  ^  £  0,  an  open  subset  of  IR  ,  and  d  — >  oo  as  n  -^  oo. 

NE2.  The  parameter  of  interest  is  7/,  whose  true  value  rjo  lies  in  the  interior  of 
a  convex  compact  set  ^f  c  IR^^  The  true  value  of  6  induced  by  rjo  is  given  by 
^0  =  ^(%)-  The  mapping  rj  h^  9(7])  takes  values  from  IR''^  to  IR''  where  c-d  <  di  <  d, 
for  some  c  >  0.  Moreover,  assume  that  rjo  is  the  unique  solution  to  the  system 
9{'q)  =  6q  and  that  1|0(?7)  -  r/(6'o)||  >  eo!!?/  -  '/oil  for  some  eo  >  0  and  all  ?/  G  *. 

Thus,  the  parameter  9  corresponds  to  a  high-dimensional  linear  parametrization  of  the 
log-density,  and  rj  describes  the  lower-dimensional  parametrization  of  the  density  of  interest. 
There  are  many  classical  examples  of  curved  exponential  families;  see  for  example  Efron 
[12],  Lehmann  and  Casella  [27],  and  Bandorff-Nielsen  [2].  An  example  of  the  condition  that 
puts  a  curved  structure  onto  an  exponential  family  is  a  moment  restriction  of  the  type: 

rn{x,a)f{x,9)dx  =  0. 

This  condition  restricts  9  to  lie  on  a  curve  that  can  be  parameterized  as  {9{rj),i]  £  ^}, 
where  component  rj  =  (a,/3)  contains  a  as  well  as  other  parameters  0.  In  econometric 
apphcations,  often  moment  restrictions  represent  Euler  equations  that  result  fi-om  the  data 
X  being  an  outcome  of  an  optimization  by  rational  decision-makers;  see  e.g.  Hansen  and 
Singleton  [18],  Chamberlain  [7],  Imbens  [20],  and  Donald,  Imbens  and  Newey  [10].  Thus,  the 
curved  exponential  framework  is  a  fundamental  complement  of  the  exponential  fi-amework, 
at  least  in  certain  fields  of  data  analysis. 

We  require  the  following  additional  regularity  conditions  on  the  mapping  9{-). 

NE3.  For  every  k,  and  uniformly  in  7  £  B(0,K.\/(i),  there  exists  a  linear  operator 
G  :  IR'''  -^  IR'^  such  that  G'G  has  eigenvalues  bounded  from  above  and  away  from 


Figure  2.  This  figure  illustrates  the  mapping  9{-).  The  (discontinuous) 
solid  line  is  the  mapping  while  the  dash  line  represents  the  linear  map  induced 
by  G.  The  dash-dot  line  represents  the  deviation  band  controlled  by  ri„  and 
r2n- 


zero,  and  for  every  n 

v^  (0(770  +  7/\/H)  -  d{vo))  =  rm  +  {Id  +  R2n)G-f, 
where  j|rin||  <  5i„  and  ||i?2n|j  <  <52n-  Moreover,  those  coefficients  are  such  that 
^inVa^O   and   52nd --*0. 

Thus  the  mapping  rj  i— >  9{i])  is  allowed  to  be  nonlinear  and  discontinuous.  For  example, 
the  additional  condition  of  Ji„  =  0  implies  the  continuity  of  the  mapping  in  a  neighborhood 
of  770-  More  generally,  condition  NE3  does  impose  that  the  map  admits  an  approximate 
linearization  in  the  neighborhood  of  rjo  whose  quality  is  controlled  by  the  errors  Sin  and 
S2n-  An  example  of  a  kind  of  map  allowed  in  this  framework  is  given  in  the  figure. 

Again,  given  a  prior  tt  on  0,  the  posterior  of  rj  given  the  data  is  denoted  by 

n 

Mv)  «  7r(0(r/))  •  H  fixi;  rj)  =  77(^(77))  ■  exp  {ni'e{ii)  -  ni>{9{T])))  . 

i=l 

In  this  framework,  we  also  define  the  local  parameters  to  describe  contiguous  deviations 
from  the  true  parameter  as 

J  =  y/niv  -  Vo)  -  s,    s  =  {G'G)~^G'\/n{x  -  11), 


where  s  is  a  first  order  approximation  to  the  normalized  maximum  likehhood/extremum 
estimate.  Again,  similar  bounds  hold  for  s:  E[s]  —  0,  E[ss']  =  {G'G)~^,  and  ||s'||  =  Op(Vd). 
The  posterior  density  of  7  over  T,  where  F  =  ^{'^  -  'i]o)  -  s,  is  /(7)  =   ,-  /Z!^  ,  where 

£(7)  =  exp  {nx'{e{r,o  +  (7  +  s)/V^)  -  e{r]o))  +  n^iOim  +  (7  +  s)/07))  -  niid{^o))) 

^n{d{rio  +  h  +  s)/V^)). 

(5.22) 

The  condition  on  the  prior  is  the  following: 

NE4  The  prior  n{rj)  <x  77(^(77)),  where  Tr{9)  satisfies  condition  E3. 

Theorem  7.   Conditions  E2-E4  arid  NEI-NE4  imply  conditions  C1-C3.  ,  ■; 

Proof.  See  Appendix  A.  ■     ,  lH 

As  before,  Theorems  1  and  7  prove  the  asymptotic  normality  of  the  posterior, 

/"  1/(7)  -  0(7)M7  =/   1/(7)- <A(7)|rf7  +  Op(l)=Op(l),  -     ■;    " 

where 

<t^[l)  = ^- 77^  exp  f -^7'(G'G)7 

(27r)^/2det((G'G)-i)i/2      '  \    2' "^ 

Theorem  3  implies  further  that  the  main  results  of  the  paper  on  the  polynomial  time 
sampling  and  integration  apply  to  this  curved  exponential  family.  '  ■ 

'     '  6.  Conclusion 

Tlris  paper  studies  the  computational  complexity  of  Bayesian  and  quasi-Bayesian  esti- 
mation in  large  samples  carried  out  using  a  basic  Metropolis  random  walk.  Our  framework 
permits  the  parameter  dimension  of  the  problem  to  grow  to  infinity  and  allows  the  underly- 
ing log-likelihood  or  extremum  criterion  function  to  be  discontinuous  and/or  non-concave. 
We  establish  polynomial  complexity  by  exploiting  a  central  hmit  theorem  framework  which 
provides  structural  restrictions  for  the  problem,  i.e.,  the  posterior  or  quasi-posterior  density 
approaches  a  normal  density  in  large  samples.  ,,'       ..',,.■'.,,    .  ;      _ 

The  analysis  of  this  paper  focused  on  a  basic  random  walk.  Although  it  is  widely  used 
for  its  simplicity,  it  is  not  the  most  sophisticated  algorithm  available.  Thus,  in  principle 
further  improvements  could  be  obtained  by  considering  different  kinds  of  random  walks 
(or  variance  reduction  schemes).    As  mentioned  before,  essentially  only  one  lemma  of  our 


analysis  relies  on  the  particular  choice  of  the  random  walk.  This  suggests  that  most  of  the 
analysis  is  apphcable  to  a  variety  of  different  implementations. 


Appendix  A.  Proofs  of  Other  Results 

Proof  of  Theorem  1.     Prom  CI  it  follows  that 

f  \f{X)-4>{X)\dX    <     f  \f{\)-4>W\dX+  f    {f{X)  +  4>iX))dX 

=     /   |/(A)-^(A)|a!A  +  Op(l) 
Jk 

where  the  last  equahty  follows  from  Assumption  Cl."^^ 

.T        J            ^         (27r)'^/2det(J-i)i/2  . 

JNow,  denote  C^  = ? — - — r-; and  write 


m 

<P{X) 


- 1 


^{X)dX    = 


Cn  ■  exp[  In  e{X)-  (~-A'JA 


l\(p{X)dX 


Combining  the  expansion  in  C2  with  conditions  imposed  in  C3 

/(A) 
0(A) 


-1  (j){X)dX    <    Jj^\Cn-exp{ei  +  e2X'JX)-l\4>{X)dX 

+  jj.  \Cn  ■  exp  (-ei  -  ezA' JA)  -  1|  <^(A)dA 

<  2  /  |c„-e°p(^)-lU(A)dA 

Jk  '  ' 

<  2|C„e°p(i)  -  1| 

The  proof  then  follows  by  showing  that  Cn  -^  1-  We  have  that  R  =  \\K\\  =  0{\/d),  and  by 
assumption  Cl 


C„ 


|A||<fl 


e{X)dx 


.iVJAg-ei-f(A'JA)^;^ 


(1+0(1))/  g{X)dX  (1  +  0(1))/  e-2^''^dX 

Jm<fi  Jm<R 


-iA'(J+£2J)A 


det(J)        7iiA|l<fi  (2^)'^/2  det((J  +  eo^J)-')'/^ 


dX 


(1+0(1))  V  det(J  +  e2J) 


l|Ai|<fl(27r)^/2det(J-i)V2 


dX 


For  the  case  of  4>!  it  follows  from  the  standard  concentration  of  measure  arguments  for  Gaussian 
densities,  see  Lovasz  and  Vempala  [31]. 


Since  £2  <  1/2,  we  can  define  W  ~  yV(0,  (1 +62)"^^"^)  and  V~  N{Q,J-'^)  and  rewrite  our 
bounds  as 

1        /||A||<fi^(^)^'^     >         e-2^1       f     1      Y/^P{\\W\\<R) 


l+o(l)J^|A||<fl5(A)rfA     -     {l+o{l))\l+e2j        P{\\V\\<R) 


(1+0(1))  Vl+e2 


1 


d/2 


where  the  last  inequality  follows  from  P{\\W\\  <  R)  >  P{\\./r+l^_W\\  <  R)  =  P{\\V\\  <  R). 
Likewise, 


<^n       /||;,|]<^fi(A)rfA  Vl-e2, 

Therefore  Cn  —^  i  since  ei  — >  0,  €2  ■  c'  — *  0.       D 
Proof  of  Lemma  2.  The  result  follows  immediately  from  equations  (2.7)-(2.8).  D 

Proof  of  Lemma  3.      Let  M  :=  P^^^  r- — .    We  will  prove  the  lemma  bj'  contradiction. 
Assume  that  there  exists  a  partition  of  K  =  Si  U  ^2  U  S3,  with  d{Si,  S2)  >  t  such  that 

{Mls,{x)  ~  ls,(x))f{x)dx  >  0,  for  i  =  1,2. 

We  will  use  the  Localization  Lemma  of  Kannan,  Lovasz,  and  Simonovits  [24]  in  order  to 
reduce  a  high-dimensional  integral  to  a  low-dimensional  integral. 

Lemma  6   (Localization  Lemma).    Let  g  and  h  he  two  lower  semi-continuous  Lebesgue 
integrable  functions  on  IR    such  that 


g{x)dx  >  0  and    /     h{x)dx  >  0. 
Then  there  exist  two  points  a,b  £  M.^,  and  a  linear  function  7  :  [0, 1]  -^  IR-i-  such  that 


T~{t)g{{l-t)a  +  tb)dt>0  and    /    f'-^(t)hi{l~t)a  +  tb)dt>0, 

Jo 

where  ([a, 6], 7)  is  said  to  form  a  needle. 

Proof.  See  Kannan,  Lovasz,  and  Simonovits  [24]. 

By  the  Localization  Lemma,  there  exists  a  needle  (0,6,7)  such  that 

i''~\-u-)f{{'^  -u)a  +  ub){Mls.^{{l  -u)a  +  ub)  -  ls:,{i.l  -u)a  +  ub))du  >  0, 


for  i  =  1,2.    Equivalently,  using  7(u)  =  l{u/\\h  -  a\\)  and  v  :=  {h  -  a)/\\b  -  a\\  where 
||6  —  a\\  >  t,  we  have 

\\b-a\\ 

1      {.u)f{a  +  uv)  (Mls^ia  +  uv)  -  Is-^ia  +  uv))  du  >  0, 
for  i  —  1,2.  In  turn,  this  last  expression  can  be  rewritten  as,  for  i  =  1, 2, 

/■[|6-a||  r 

M  'y'^-\u)f{a  +  uv)ls,{a  +  uv)du>  ^'^-'^{u)ls^{a  +  uv)f{a  +  uv)du.     (A.23) 

In  order  for  the  left  hand  side  of  (A.23)  be  positive  for  i  =  1  and  i  =  2,  the  line  segment 
[a,b]  must  contain  points  in  Si  and  ^2.  Since  d{Si,S2)  >  t,  we  have  that  ^3  n  [a,  b]  contains 
an  interval  whose  length  is  at  least  t.  We  will  prove  that  for  every  uj  e  IR 


Y     {u)f{a+uv)du>  M  mm  <^         Y-\u)f{a  +  uv)du,    /  -f'^-\u)f{a  +  uv)du\ 

(A.  24) 
which  contradicts  relation  (A.23)  and  proves  the  lemma. 

First,  note  that  f{a  +  uv)  —  e~ll''+™ll  m{a^uv)  ~  e~""+'"i""''''''m(a  +  ui')  where  t\  :—  2a'v 
and  tq  :=  -||a|p. 

Next,  recall  that  m(a +  uv)7'^~^(t()  is  still  a  unidimensional  log-/3-concave  function  on  u. 
By  Lemma  7  presented  in  Appendix  B,  there  exists  a  unidimensional  logconcave  function  in 
such  that  0m.{u)  <  m{a  +  uv)^'^''^{u)  <  m(u)  for  every  u.  Moreover,  there  exists  numbers 
So  and  si  such  that  ■m(w)  =  sqb^^'^  and  m(w  +  t)—  soe'^'^"'"'^  Due  to  the  log-concavity  of 
?n,  this  implies  that 

m(u)  >  soe^'"    ioT  u  £  {w.w  +  t)    and   iri{u)  <  soe^^^    otherwise. 

Thus,  we  can  replace  m{a  +  uv)'y'^~^{u)  by  Soe*^"  on  the  right  hand  side  of  (A. 24)  and 
replace  m{a  +  uv)f'^~''^{u)  by  /3soe*^"  on  the  left  hand  side  of  (A. 24).   After  defining  ri  = 
i .    Ti  +  si  and  To  :=  tq  +  In  sq,  we  have  ■  ^ 

r+t        .   „      ^  f    /•«        ,   „      .  f\\b-a\\        ,   ^      _       1 

/3  /        e-""+''i"+''odu>  MminM     e-"'+''i "+'''' du,    /  e~""+'"i"+''"(iu  ^       (A.25) 

Jw  \Jo  Jw+t  J 

which  is  equivalent  to 

rw+t  f  f2  (     fw  -  f2  p\\b-a\\  -  ?-         1 

P  e-("-^)'+^°+^du>MminM     e^^""^)  +^°+^du,    /  e-^'^-i'^'+^^+^du}  . 

Jw  \^Jo  Jw+t  J 

(A.26) 


Now,  cancel  the  term  e^^^"^^!^  on  both  sides  and;  since  we  want  the  inequahty  (A. 26)  holding 
for  any  w,  (A. 26)  is  impUed  by 


e'^^'du  >  ^"^  ^     min  <j  /      e'""'du,    /      e  """du  I  (A. 27) 


holding  for  any  w.  This  inequality  is  Lemma  2.2  in  Kannan  and  Li  [23].  For  brevity,  we 
will  not  reproduce  the  proof.       □ 

Proof  of  Corollary  2.  Consider  the  change  of  variables  x  =  '^  Z^ .  Then,  in  x  coordinates, 
/(.t)  =  e^'^m{\/2J~^/~x)  satisfies  the  assumption  of  Lemma  3  and  d{Si,S2)  >  t\/Xmin/2. 
The  result  follows  by  applying  Lemma  3  with  x  coordinates.       D 

Proof  of  Lemma  5.  Define  K  :=  B{0,  R),  so  that  R  is  the  radius  of  K;  also  let  r  :=  4\/da 
(where  a~  <  jqJjji),  and  let  q{x\u)  denote  the  normal  density  function  centered  at  u  with 
covariance  matrix  a'^I.  We  use  the  following  notation:  Bu  =  B{u,r),  By  =  B{v,r),  and 
Au,v  —  B^jHSyn  K.  By  definition  of  r,  we  have  that  /^  q[x\u)dx  =  f^  q{x\v)dx  >  1  — \. 
Define  the  direction  w  =  {v  —  u)/\\v  —  u\\.  Let  Hi  =  {x  £  B^nBy  :  w'{x  —  u)  >  Hu  — w]]/2}, 
H2  =  {x  G  Bu  n  By  :  w'{x  —  u)  <  \\v  —  u\\/2}.  Consider  the  one-step  distributions  fi-om  u 
and  V.  We  have  that 

\\Pu-Pv\\tv     <     1-/      mm{dPu,dPy} 

J  Au.v  : 

=     1-^^      min|.7(.rl.)min|^,l|,g(x|i>)min|^,lUdx 
<     1  -  Pe~^''  mm{q{x\u),q{x\v)}dx 

/        ■  JAu.v 

,     ,.'     ,  <     l-/3e-^'"(/         q{x\u)dx+  f         q{x\v)dx-) 

where  \\u  —  v\\  <  a/S.  Next  we  will  bound  from  below  the  last  sum  of  integrals  for  an 
arbitrary  u  €  K. 

We  first  bound  the  integrals  over  the  possibly  larger  .sets,  respectively  H\  and  Ho-  Let 
h  denote  the  density  function  of  a  univariate  random  variable  distributed  as  N{0,a~).  It 
is  easy  to  see  that  h{t)  =  J^,,_s_^q{x\u)dx,  i.e.  h  is  the  marginal  density  oi  q{.\u)  along 
the  direction  w  (up  to  a  translation).  Let  //s  =  {x  :  —\\u  —  v\\/'2  <  w'{x  -  u)  <  \\v  —  u\\/2}. 
Note  that  B^  C  HiU  {Ho  —  \\u  —  v\vS)  U  iJa  where  the  union  is  disjoint.  Armed  with  these 


observations,  we  have 


q{x\u)dx  +   /     q{x\v)dx     =      /     q{x\u)dx  +   /  q{x\u)dx 

Jh2  J  Hi  JH2-\\u-v\\w 

>  I     q{x\u)dx  —   /     q{x\u)dx 

J  Bu  JH3 

r  r\W-v\\/2 

=      /     q{x\u)dx  —   I  h{t)dt 

JBu  J-\\u-v\\/2 

>  1  -  4  -  /  ^     dt 

e        y_||„_^||/2    \/2na 


>    l---\]u-v\\^^>l~^ =>4A.28) 

e^       8^/2n        10 


1        „  „      1       .    .        1  1 

/27r  cr 
where  we  used  that  \\u  —  v\\  <  a/8  by  the  hypothesis  of  the  lemma. 

In  order  to  take  the  support  K  into  account,  we  can  assume  that  u,v  G  dK,  i.e.  ||m||  = 
||w||  =  R  (otherwise  the  integral  will  be  larger).  Let  z  =  {v  +  u)/'2  and  define  the  half  space 
Hz  =  {x  :  z'x  <  z'z}  whose  boundary  passes  through  u  and  v  (Using  ||u||  =  \\v\\  =  R  it 
follows  that  z'v  =  z'u  =  z'z/2). 

By  the  symmetry  of  the  normal  density,  we  have 


q{x\u)dx  =  -^         q{x\u)dx. 


HiHH^  ^  JHi 

Although  HidH.  does  not  he  in  K  in  general,  simple  arithmetic  shows  that  HinlH^  —  ■^jrliT )  Q 


fill 
A'.  13 


Using  that  /„  w„      r^^  ,  '?(^|^')  =  Jn        h{t)dt,  we  have 


q{x\u)dx     >      I  ^      q{x\v)dx  >    I  q{x\u)dx  —   /  h{t)dt 

1    f  .rV«e-*'/2-= 

>     -  /     q{x\u)dx  -  /  dt 

^  J  Hi  Jo  \J2i\a 


"Indeed,  take  y  £  Hi  n  (if;  -  5?"  "Fir )  •  We  can  write  y  =  y^.  \\^A  +  s,  where  ||s||  <  r  (since 
||j/-  lilii  (m)|  <  lly  -  -II  =  lly  -  ^11  <  \\y  -  "11  +  IIIJ'  ~  '^11  ^  '')  ^"^  s  is  also  orthogo- 
nal to  z.  Since  y  6  (■'^i  ""  a" irfir)'  ^"^  ^^^^  \^^  <  ^h  ~  ^  ~  H^H  ~  ^  -  ^  ~  ^'  Therefore, 
Ibll  =  /(fST^NP  <  ^{R-ff+r^  =  ^JR'  -  2Hf  +  ^  +  r2  <  R. 


where  we  used  that  j^  <  —^  since  r  —  4^/du  and  -^  <  j^. 

By  symmetry,  the  same  inequahty  holds  when  u  and  Hi  are  replaced  by  v  and  H2 
respectively.  Adding  these  inequalities  and  using  (A. 28),  we  have 

f  \        9  4 

q{x\u)dx  +  /  q{x\v)dx     >  —  -  777^  ^  V3- 

HiHK  JH2nK  J        -^U       15\/27r 


Thus,  we  have 

\\Pu-PA<\- 


P-Lr 


3 

and  the  result  follows  since  Lr  <\.  D 

Proof  of  Lemma  1.      Starting  from  an  arbitrary  point  in  K,  assume  that  the  random 
walk  makes  a  proper  move.  If  this  is  the  case  note  that 

max^:Q(^)>o,.4e.4  §4!     <     mB.^,eK0^  {-iirf- Aet{J-')eh-'J-e^^^+'-^--'^- 
The  result  follows  by  invoking  the  CLT  restrictions. 

Next  we  show  that  the  probability  p  of  making  a  proper  move  is  at  least  a  positive 
constant.  We  will  use  the  notation  defined  in  the  proof  of  Lemma  5.  Let  u  be  an  arbitrary 
point  in  K .  We  have  that 

p    =     j^mmi^^y  l^q{x\u)dx>l3e-^'' ^g^^i^q{x\u)dx 
^     /^e-^'-  Sb^^h^  q{x\u)dx  -  /;'/^  h{t)dt  >  i. 

D 
Proof  of  Theorem  5.     Consider  the  sample  mean  defined  by 

1     ^ 

i  =  l 

with  the  underlying  sequence  (A^'-^,  A^'-^, ...,  A^'-^)  produced  by  one  of  the  schemes  (lr,  ss, 
ms)  as  follows: 

•  for  lr,  A*'^  =  A'"*"^,  where  A'^-^  is  produced  by  iterating  the  chain  B  +  i  times 
starting  with  an  initial  draw  fi-om  /q.  Define  the  density  after  B  steps  of  the  chain 
starting  with  /o  by  T^/q.  Thus  A^  has  the  distribution  T^/o,  and  A*+^  has  the 
distribution  T'+^/o. 

•  for  ss,  A*''^  =  A^^^,  where  5  is  the  number  of  draws  that  are  "skipped". 

•  for  ms,  A*'^  are  i.i.d.  draws  from  T^  fo,  i.e.  each  i-th  draw  is  obtained  bj'  sampling 
an  initial  point  from  /o  and  iterating  the  chain  B  times.  .  • 


We  have  that 

MSE{-[1b,n)    =  Etbj^  [M5£(/2b,jv|A^  =  A)]  =  Ej 

=  Ef  [MSE{11b,n\>'^  =  A)]  +  Ef 

<  Ef  [MSE{j1b,n\X'^  =  A)]  +  fEf 

=  {a%/N)  +  2g^T''fo-f\\TV, 


MSE{J1b,n\X^  =  \) 


r^/o(A) 


MSE{i2b,n\>^^  =  A) 

r%(A) 


/(A) 

r^/o(A) 

/(A) 


-  1 


/(A) 


-  1 


where  cr^  yy  is  the  variance  of  tlie  sample  average  under  the  assumption  that  X^  is  distributed 
exactly  according  to  /.  (We  also  used  the  fact  that  \\T^  fo  -  fWrv  =  \\\T^  k  -  /lUi-) 

The  bound  on  cr~  j^  will  depend  on  the  particular  scheme,  as  discussed  below. 

We  require  that  the  second  term  is  smaller  than  ?/3,  which  is  equivalent  to  imposing 
that  \\T^ fo  —  fWrv  <  «%■  Using  Theorem  2,  since  /o  is  a  M-warm  start  for  /, 


x/Mfl-C 


2 

B     > 


<  VMe~ 

<  In 


QVMg- 


In 


Q\/Mg- 


Next  we  bound  cr'^y-  Specifically,  we  determine  the  number  of  post-burn  iterations  Ni^, 
Ngs,  or  A^ms  needed  to  set  the  overall  mean  square  error  less  than  e. 

To  bound  Nir,  note  that  a^  j^  <  <Jg  <  Jo-§i  where  the  last  inequality  follows  from  Corol- 
lary 2.  Thus,  Nij.  =  2a  6   ^^-^^  q  ^^^  above  suffice  to  obtain  MSE{j1b,n)  <  £■ 

To  bound  Nss,  we  first  must  choose  a  spacing  S  to  ensure  that  the  autocovariances  7^ 

71n  <  70  +  2Af7i  <  70  +  2iV7o  (  1 


f- 


where  we  used  Corollary  2  and  that  A''-^  and  A'+^'^  are  spaced  by  S  steps  of  the  chain.  By 
choosing  the  spacing  S  as 


2\5 


<  e"'^^  < 


.,..e.S,l,„(^) 


and  using  Nss  =  ,  the  mean  square  error  for  the  ss  method  can  be  bounded  as 

e 

MSE{j1b,n)     <     j^  (70 +  2Ar,37i)  + 25^^/0 -/IItk 

To  bound  Nms,  we  observed  that  7/,-  =  0  for  all  /c  7^  0  implying  that  A'ISE{'p,B^j\!)  < 
^  +  £/3  <  £  provided  that  Nr,is  >  2jo/{3e).      D 
Proof  of  Theorem  6.     Given 

A'  =  B(0,  IJA'II)    where    ||A'f  =  erf, 

our  condition  Cl  is  satisfied  by  the  argument  given  in  proof  of  Ghosal's  Lemma  4.  Further, 
our  condition  C2  is  satisfied  by  the  argument  given  in  the  proof  of  Ghosal's  Lemma  1  with 
£1=0  and 

1  /     fed  ^    ,^.       cd  ^     ,  , 
^2=0     \  —Bin{0)  +  —B2n{c) 
6  \\   n  n 

and  our  condition  C3  is  satisfied  since  by  E3  and  E4 

02\\Kf-Q.     D 

Comment  A.l.  Ghosal  [17]  proves  his  results  for  the  set  K'  =  B{0,C\/d\ogd).  His 
arguments  actually  go  through  for  the  set  A'  =  B(0,CVd)  due  to  the  concentration  of 
normal  measure  under  d  ^  00  asymptotics.  For  details,  see  [3]. 

Proof  of  Theorem  7.  Take  K  =  B{0,  \\K\\),  where  \\Kf  =  Cdi  for  some  C  sufficiently 
large  independent  of  d  (see  [3]  for  details).  Then  condition  Cl  is  satisfied  by  the  argument 
given  in  the  proof  of  Ghosal's  Lemma  4  and  NE3.  Further,  condition  C2  is  satisfied  by  the 
argument  given  in  the  proof  of  Ghosal's  Lemma  1  and  NE3  with 

£1   =  Op  (C  +  (1  +  52n)SlnVd?j  , 

£2  =  Op  f  <52.  +  <5.?„  +   (  A&1„(0)  +  —B2n{C) 

and  condition  C3  is  satisfied  since  by  E3,  E4,  NE3,  and  NE4, 

e2\\Kf^0.     a 
Comment  A. 2.  For  further  details  and  discussion,  see  [3]. 


Appendix  B.  Bounding  log-/?-concave  functions 

Lemma  7.  Let  /  :  IR  — >  IR  6e  a  unidimensional  log-P -concave  function.    Then  there  exists 
a  logconcave  function  5  ;  IR  -^  IR  such  that 

Pgix)  <  f[x)  <  g{x)    for  every   a;  €  IR. 

Proof.  Consider  h{x)   =  lnf{x)  a  (ln/3)-concave  function.     Now,  let  m  be  the  smallest 
concave  function  greater  than  h{x)  for  every  x,  that  is, 

{k  k  k  -^ 

^\rh{yi)  :  fc  e  N,A  e  IR'',A  >  0,^A,  =  l,^A,yi  =  x\  . 
1=1  1=1  i=\  J 

Recall  that  the  epigraph  of  a  fuirction  w  is  defined  as  epi^,  =  {{x,t)  :  t  <  w{x)}.  Using 
our  definitions,  we  have  that  epim  =  conv(epi^)  (the  convex  hull  of  epi/j),  where  both  sets 
he  in  IR".  In  fact,  the  values  of  m  are  defined  only  by  points  in  the  boundary  of  conv(epi/i). 
Consider  {x,m{x))  e  epim,  since  the  epigraph  is  convex  and  this  point  is  on  the  boundary, 
there  exists  a  supporting  hyperplane  H  on  (x,  m{x)).  Moreover,  [x,  ■m{x))  G  conv {epijiOH). 
Since  H  is  one  dimensional,  (a;,?7i(a;))  can  be  written  as  convex  combination  of  at  most  2 
points  of  epifi. 

Furthermore,  by  definition  of  log-/?-concavity,  we  have  that 

lnl//?>      sup      Xh{y)  +  {l~X)h{z)-h{Xy  +  {l-X)z). 

\e[0,l],y,z 

Thus,  h{x)  <  m{x)  <  h{x)  +ln(l//?).   Exponentiating  gives  f{x)  <  g{x)  <  4/(a;),  where 
g{x)  =  e'"^^)  is  a  logconcave  function.  D 
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